ar X iv : c m p - lg / 9 60 50 14 v 1 1 2 M ay 1 99 6 Clustering Words with the MDL Principle
نویسنده
چکیده
We address the problem of automatically constructing a thesaurus by clustering words based on corpus data. We view this problem as that of estimating a joint distribution over the Cartesian product of a partition of a set of nouns and a partition of a set of verbs, and propose a learning algorithm based on the Minimum Description Length (MDL) Principle for such estimation. We empirically compared the performance of our method based on the MDL Principle against the Maximum Likelihood Esti-mator in word clustering, and found that the former outperforms the latter. We also evaluated the method by conducting pp-attachment disambiguation experiments using an automatically constructed thesaurus. Our experimental results indicate that such a thesaurus can be used to improve accuracy in disam-biguation.
منابع مشابه
ar X iv : c m p - lg / 9 60 50 18 v 1 1 3 M ay 1 99 6 Efficient Tabular LR Parsing
We give a new treatment of tabular LR parsing, which is an alternative to Tomita’s generalized LR algorithm. The advantage is twofold. Firstly, our treatment is conceptually more attractive because it uses simpler concepts, such as grammar transformations and standard tabulation techniques also know as chart parsing. Secondly, the static and dynamic complexity of parsing, both in space and time...
متن کاملar X iv : a lg - g eo m / 9 50 20 26 v 2 9 M ay 1 99 5 ALGEBRAIC SURFACES AND SEIBERG - WITTEN INVARIANTS
متن کامل
ar X iv : q - a lg / 9 70 50 12 v 1 1 6 M ay 1 99 7 Poisson structures on the center
It is shown that the elliptic algebra Aq,p(ŝl(2)c) has a non trivial center at the critical level c = −2, generalizing the result of Reshetikhin and Semenov-Tian-Shansky for trigonometric algebras. A family of Poisson structures indexed by a non-negative integer k is constructed on this center.
متن کاملar X iv : q - a lg / 9 60 50 33 v 1 2 1 M ay 1 99 6 CRM - 2278 March 1995 q - Ultraspherical Polynomials for q a Root of Unity
Properties of the q-ultraspherical polynomials for q being a primitive root of unity are derived using a formalism of the soq(3) algebra. The orthogonality condition for these polynomials provides a new class of trigonometric identities representing discrete finite-dimensional analogs of q-beta integrals of Ramanujan. Mathematics Subject Classifications (1991). 17B37, 33D80
متن کاملar X iv : q - a lg / 9 60 50 08 v 1 5 M ay 1 99 6 QUANTUM PRINCIPAL BUNDLES & THEIR CHARACTERISTIC CLASSES
A general theory of characteristic classes of quantum principal bundles is sketched, incorporating basic ideas of classical Weil theory into the conceptual framework of non-commutative differential geometry. A purely cohomological interpretation of the Weil homomorphism is given, together with a geometrical interpretation via quantum invariant polynomials. A natural spectral sequence is describ...
متن کامل